8 research outputs found

    Knowledge Author: Facilitating user-driven, Domain content development to support clinical information extraction

    Get PDF
    Background: Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text. Results: Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76%) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86%) and varied recall for modifiers (certainty: 91% sidedness: 80%, neurovascular anatomy: 46%). Conclusion: Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts

    Surface salinity measurements - COSMOS 2005 experiment in the Bay of Biscay

    Get PDF
    12 páginas, 7 figuras, 2 tablas.Sea surface salinity (SSS) data were collected in the Bay of Biscay between April and November 2005. The major source of data is 15 surface drifters deployed during the COSMOS experiment in early April and early May 2005 [12 from the Scripps Instution of Oceanography (SIO) and 3 from METOCEAN]. This is complemented by thermosalinograph (TSG) data from four French research vessels and four merchant vessels, from salinity profiles collected by Argo profiling floats and CTD casts, and from surface samples during two cruises. Time during the two cruises was dedicated to direct inspection of the drifters, recovering some, and providing validation data. This dataset provides a unique opportunity to estimate the accuracy of the SSS data and to evaluate the long-term performance of the drifter salinities. Some of the TSG SSS data were noisy, presumably from bubbles. The TSG data from the research vessels needed to be corrected from biases, which are very commonly larger than 0.1 pss-78 (practical salinity scale), and which in some instances evolved quickly from day to day. These corrections are only available when samples were collected or ancillary data are available (e.g., from CTD profiles). The resulting accuracy of the corrected TSG dataset, which varies strongly in time, is discussed. The surface drifter SSS data presented anomalous daytime values during days with strong surface warming. These data had to be excluded from the dataset. The drifter SSS presented initial biases in the range 0.009 to −0.026 pss-78. The (usually) negative bias increased by an average of −0.007 pss-78 during the average 65-day period before the COSMOS-2 cruise on 22–27 June. High chlorophyll derived from satellite ocean color, and therefore high density of phytoplanktonic cells, is observed in Medium Resolution Imaging Spectrometer (MERIS)/Moderate Resolution Imaging Spectroradiometer (MODIS) composites during part of the period, in particular in late April or early May. No correlation was found between the change in bias and the estimated surface chlorophyll. Evolution during the following summer months is harder to ascertain. For three buoys, there is little change in bias, but for two others, there could have been an increase in bias by up to 0.03 or 0.04 pss-78 during July–August. Seven drifters were recovered in the autumn, which provide recovery or postrecovery estimates of the biases, suggesting in three cases (out of seven) a large (0.02–0.03 pss-78) increase in bias during the autumn months, but no significant increase for the other four drifters.Funding for the research was provided by CNES.Peer reviewe

    Developing a web-based SKOS editor

    Get PDF
    BACKGROUND: The Simple Knowledge Organization System (SKOS) was introduced to the wider research community by a 2005 World Wide Web Consortium (W3C) working draft, and further developed and refined in a 2009 W3C recommendation. Since then, SKOS has become the de facto standard for representing and sharing thesauri, lexicons, vocabularies, taxonomies, and classification schemes. In this paper, we describe the development of a web-based, free, open-source SKOS editor built for the development, curation, and management of small to medium-sized lexicons for health-related Natural Language Processing (NLP). RESULTS: The web-based SKOS editor allows users to create, curate, version, manage, and visualise SKOS resources. We tested the system against five widely-used, publicly-available SKOS vocabularies of various sizes and found that the editor is suitable for the development and management of small to medium-size lexicons. Qualitative testing has focussed on using the editor to develop lexical resources to drive NLP applications in two domains. First, developing a lexicon to support an Electronic Health Record-based NLP system for the automatic identification of pneumonia symptoms. Second, creating a taxonomy of lexical cues associated with Diagnostic and Statistical Manual of Mental Disorders (DSM-5) diagnoses with the goal of facilitating the automatic identification of symptoms associated with depression from short, informal texts. CONCLUSIONS: The SKOS editor we have developed is - to the best of our knowledge - the first free, open-source, web-based, SKOS editor capable of creating, curating, versioning, managing, and visualising SKOS lexicons
    corecore